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Abstract. There has been a split in the statistics community about the 
need for taking covariates into account in the design phase of a clinical 
trial. There are many advocates of using stratification and covariate- 
adaptive randomization to promote balance on certain known covari- 
ates. However, balance does not always promote efficiency or ensure 
more patients are assigned to the better treatment. We describe these 
procedures, including model-based procedures, for incorporating covari- 
ates into the design of clinical trials, and give examples where balance, 
efficiency and ethical considerations may be in conflict. We advocate a 
new class of procedures, covariate- adjusted response-adaptive (CARA) 
randomization procedures that attempt to optimize both efficiency and 
ethical considerations, while maintaining randomization. We review all 
these procedures, present a few new simulation studies, and conclude 
with our philosophy. 

Key words and phrases: Balance, covariate-adaptive randomization, 
covariate- adjusted response-adaptive randomization, efficiency, ethics. 



1. INTRODUCTION 

Clinical trials are often considered the "gold stan- 
dard" in convincing the medical community that a 
therapy is beneficial in practice. However, not all 
clinical trials have been universally convincing. Tri- 
als that have inadequate power, or incorrect assump- 
tions made in planning for power, imbalances on 
important baseline covariates directly related to pa- 
tient outcomes, or heterogeneity in the patient pop- 
ulation, have contributed to a lack of scientific con- 
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sensus. Hence, it is generally recognized that the 
planning and design stage of the clinical trial is of 
great importance. While the implementation of the 
clinical trial can often take years, incorrect assump- 
tions and forgotten factors in the sometimes rushed 
design phase can cause controversy following a trial. 
For example, take the trial of erythropoietin in main- 
taining normal hemoglobin concentrations in patients 
with metastatic breast cancer (Leyland- Jones, 2003). 
This massive scientific effort involved 139 clinical 
sites and 939 patients. The study was terminated 
early because of an increase in mortality in the ery- 
thropoietin group. The principal investigator 
explains: 

. . . drawing definitive conclusions has been 
difficult because the study was not de- 
signed to prospectively collect data on many 
potential prognostic survival factors that 
might have affected the study outcome. . . . 
The results of this trial must be inter- 
preted with caution in light of the poten- 
tial for an imbalance of risk factors be- 
tween treatment groups. . . . The randomi- 
sation design of the study may not have 
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fully protected against imbalances because 
the stratification was only done for one 
parameter, . . . and was not done at each 
participating centre. ... It is extremely un- 
fortunate that problems in design. . . have 
complicated the interpretation of this study. 
Given the number of design issues uncov- 
ered in the post hoc analysis, the results 
cannot be considered conclusive. 

An accompanying commentary calls this article 
"alarmist," thus illustrating the scientific conundrum 
that covariates present in clinical trials. There is no 
agreement in the statistical community about how 
to deal with potentially important baseline covari- 
ates in the design phase of the trial. Traditionally, 
prestratification has been used on a small number 
of very important covariates, followed by stratified 
analyses. But what if the investigator feels there are 
many covariates that are important — too many, in 
fact, to feasibly use prestratification? 

The very act of randomization tends to mitigate 
the probability that important covariates will be dis- 
tributed differently among treatment groups. This 
property is what distinguishes randomized clinical 
trials from observational studies. However, this is a 
large sample property, and every clinical trialist is 
aware of randomized trials that resulted in signif- 
icant baseline covariate imbalances. Grizzle (1982) 
distinguished two factions of the statistical commu- 
nity, the "splitters" and the "lumpers." The splitters 
recommend incorporating important covariates into 
randomization, thus ensuring balance over these co- 
variates at the design stage. The lumpers suggest 
ignoring covariates in the design and use simple ran- 
domization to allocate subjects to different treat- 
ment groups, and adjust for covariates at the analy- 
sis stage. As Nathan Mantel once pointed out (Gail, 
1992): 

. . . After looking at a data set, I might see 
that in one group there are an unusually 
large number of males. I would point out 
to the investigators that even though they 
had randomized the individuals to treat- 
ments, or claimed that they had, I could 
still see that there was something unbal- 
anced. And the response I would get was 
"Well, we randomized and therefore we 
don't have to bother about it." But that 
isn't true. So, as long as the imbalance 
is an important factor you should take it 



into account. Even though it is a designed 
experiment, in working with humans, you 
cannot count on just the fact that you ran- 
domized. 

Today, many statisticians would argue that the only 
legitimate adjusted analyses are for prespecified im- 
portant covariates planned for in the analysis ac- 
cording to protocol, and that these adjustments should 
be done whether or not the distributions are imbal- 
anced (e.g., Permutt, 2000). In addition, these co- 
variates should be accounted for in the design of the 
trial, usually by prestratification, if possible. 

The three-stage philosophy of prestratifying on 
important known covariates, followed by a stratified 
analysis, and allowing for randomization to "take 
care of" the other less important (or unknown) co- 
variates, has become a general standard in clinical 
trials. This method breaks down, however, when 
there are a large number of important covariates. 
This has led to the introduction of covariate- adaptive 
randomization procedures, sometimes referred to as 
minimization procedures or dynamic allocation. 1 Some 
of these "covariate-adaptive" procedures (the term 
we will use) that have been proposed have been ran- 
domized, and others not. 

There is no consensus in either the statistics world 
or the clinical trials world as to whether and when 
these covariate-adaptive procedures should be used, 
although they are gaining in popularity and are now 
used frequently. Recently clinical trialists using these 
procedures have grown concerned that regulatory 
agencies have expressed skepticism and caution about 
the use of these techniques. In Europe, The Com- 
mittee on Proprietary Medicinal Products (CPMP) 
Points to Consider Document (see Grouin, Day and 
Lewis, 2004) states: 

Dynamic allocation is strongly discourag- 
ed Without adequate and appropriate 

supporting/sensitivity analysis, an appli- 
cation is unlikely to be successful. 

This document has led to much controversy. In a 
commentary, Buyse and McEntegart (2004) state: 



Or sometimes, unfortunately, as just adaptive designs, 
which could refer to any number of statistical methods hav- 
ing nothing to do with covariates, including response- adaptive 
randomization, sequential monitoring, and flexible interim de- 
cisions. 
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In our view, the CPMP's position is un- 
fair, unfounded, and unwise It favors 

the use of randomization methods that ex- 
pose trialists and the medical community 
to the risk of accidental bias, when the risk 
could have been limited through the use 
of balancing methods that are especially 

valuable If there were any controversy 

over the use of minimization, it would be 
expected of an independent agency to weigh 
all the scientific arguments, for and against 
minimization, before castigating the use 
of a method that has long been adopted 
in the clinical community. 

In a letter to the editor, Day, Grouin and Lewis 
(2005) respond that 

. . . the scientific community is not of one 
mind regarding the use of covariate-adaptive 

randomization procedures Rosenberger 

and Lachin cautiously state that "very lit- 
tle is known about its theoretical proper- 
ties." This is a substantial point. The di- 
rect theoretical link between randomiza- 
tion and methods of statistical analysis 
has provided a solid foundation for reli- 
able conclusions from clinical trial work 
for many years. 

It is in the context of this controversy that this 
paper is written. The intention of this paper is to 
explore the role of covariates in the design of clini- 
cal trials, and to examine the burgeoning folklore in 
this area among practicing clinical trialists. Just be- 
cause a technique is widely used does not mean that 
it is valuable. And just because there is little theo- 
retical evidence validating a method does not mean 
it is not valid. The nonspecificity of the language 
in these opinion pieces is becoming troubling: what 
is meant by the terms "minimization," "dynamic," 
"adaptive"? Many procedures to mitigate covariate 
imbalances have been proposed. Are they all equally 
effective or equally inappropriate? We add to the 
controversy by discussing the often competing crite- 
ria of balance, efficiency and ethical considerations. 
We demonstrate by example that clinical trials that 
balance on known covariates may not always lead 
to the most efficient or the most ethically attractive 
design, and vice versa. 

This paper serves as both a review and a summary 
of some of our thoughts on the matter; in particu- 
lar, we advocate a new class of procedures called 



covariate- adjusted response- adaptive (CARA) ran- 
domization procedures (e.g., Hu and Rosenberger, 
2006). The outline of the paper is as follows. In 
Section 2, we review the most popular covariate- 
adaptive randomization procedures. In Section 3, 
we describe randomization-based inference and its 
relationship to clinical trials employing covariate- 
adaptive randomization methods. In Section 4, we 
discuss what is known from the literature about the 
properties of the procedures in Section 2. In Sec- 
tion 5, we describe the alternative model-based op- 
timal design approach to the problem and describe 
properties of these procedures in Section 6. In Sec- 
tion 7, we discuss the relationship between balance, 
efficiency and ethics, and describe philosophical ar- 
guments about whether balance or efficiency is a 
more important criterion. We demonstrate by ex- 
ample that balance does not necessarily imply ef- 
ficiency and vice versa, and demonstrate that bal- 
anced and efficient designs do not necessarily place 
more patients on the better treatment. In Section 8, 
we describe CARA randomization procedures and 
their properties. In Section 9, we report the results of 
a simulation study comparing different CARA and 
covariate-adaptive randomization procedures for a 
binary response trial with covariates. Finally, we 
give a summary of our own opinions in Section 10. 

2. COVARIATE-ADAPTIVE RANDOMIZATION 

Following Rosenberger and Lachin (2002), a ran- 
domization sequence for a two-treatment clinical trial 
of n patients is a random vector T n = (Ti, . . . , T n )' , 
where Tj = 1 if the jih patient is assigned to treat- 
ment 1 and Tj = — 1 if the patient is assigned to 
treatment 2. A restricted randomization procedure is 
given by <fij+i = Pr(Tj+i = l|Tj), that is, the proba- 
bility that the (j + l)th patient is assigned to treat- 
ment 1, given the previous j assignments. When the 
randomization sequence is dependent on a patient's 
covariate vector Z, we have covariate-adaptive ran- 
domization. In particular, the randomization pro- 
cedure can then be described by <f>j+i = Pr(Tj + i = 
l|Tj, Zi, . . . , Zj + i), noting that the current patient 
is randomized based on the history of previous treat- 
ment assignments, the covariate vectors of past pa- 
tients and the current patient's covariate vector. 
The goal of covariate-adaptive randomization is to 
adaptively balance the covariate profiles of patients 
randomized to treatments 1 and 2. Most techniques 
for doing so have focused on minimizing the dif- 
ferences of numbers on treatments 1 and 2 across 
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strata, often marginally. Note that covariate-adaptive 
randomization induces a complex covariance struc- 
ture, given by Var(T n |Zi = zi, . . . , Z„ = z n ) = E„ )Z . 

For a small set of known discrete covariates, pre- 
stratification is the most effective method for forcing 
balance with respect to those covariates across the 
treatment groups. The technique of prestratification 
uses a separate restricted randomization procedure 
within each stratum. For notational purposes, if dis- 
crete covariate Zi,i = 1,. . . ,K, has ki levels, then 
restricted randomization is used within each of the 
n£i ki strata. 

The first covariate-adaptive randomization proce- 
dures were proposed in the mid-1970s. Taves (1974) 
proposed a deterministic method to allocate treat- 
ments designed to minimize imbalances on impor- 
tant covariates, called the minimization method. 
Pocock and Simon (1975) and Wei (1978) described 
generalizations of minimization to randomized clin- 
ical trials. We will refer to this class of covariate- 
adaptive randomization procedures as marginal pro- 
cedures, as they balance on covariates marginally, 
within each of k% levels of given covariates. 

The general marginal procedure can be described 
as follows for a two-treatment clinical trial. Let 
Niji(n) be the number of patients on treatment I in 
level j of covariate Zi, i = 1, . . . , K,j = 1, . . . , ki, I = 
1,2, after n patients have been randomized. When 
patient n + 1 is ready for randomization, the pa- 
tient's baseline covariate vector (Z±, . . . , Zk) is ob- 
served as (zi, . . . , zk)- Then Di(n) = Ni Zi i{n) — 
Ni Zi 2{n) is computed for each i = 1, . . . ,K. A weighted 
sum is then taken as D{n) = J2iLi w iDi(n). The 
measure D{n) is used to determine the treatment 
of patient n + 1. If D{n) > (< 0), then one de- 
creases (increases) the probability of being assigned 
to treatment 1 accordingly. Pocock and Simon (1975) 
formulated a general rule using Efron's (1971) biased 
coin design as: 

( 1/2, if D(n) = 0, 
(p n+ i = I p, if D(n) < 0, 
[l-p, ifD(ra)>0. 

When p = 1, we have Taves's (1974) minimization 
method, which is nonrandomized. Pocock and Si- 
mon (1975) investigated p = 3/4. 

Wei (1978) proposed a different marginal proce- 
dure using urns. At the beginning of the trial, each of 
^2d = i ki urns contain a± balls of type 1 and a 2 balls 
of type 2. Let Uij denote the urn representing level 
j of covariate Zi, and let lijfc(n) be the number of 



balls of type k in urn Uij after n patients have been 
randomized. For each urn compute the imbalance 
A» = (^i(n)-y ii2 (n))/(y 4jl (n) + y ii2 (n)). Sup- 
pose patient n + 1 has covariate vector (zi, . . . , zk)- 
Select the urn such that Di Zi {n) is maximized. Draw 
a ball and replace. If it is a type k ball, assign 
the patient to treatment k, and add aj. balls of 
type k with (3^ > balls of the opposite type to 
each of the observed urns. The procedure is repeated 
for each new eligible patient entering the trial. Wei 
proved that if there is no interaction between the 
covariates or between the treatment effect and co- 
variates in a standard linear model, then marginal 
balance is sufficient to achieve an unbiased estimate 
of the treatment difference. Efron (1980) provided 
a covariate-adaptive randomization procedure that 
balances both marginally and within strata, but the 
method applies only to two covariates. 

There has been substantial controversy in the lit- 
erature as to whether the introduction of random- 
ization is necessary when covariate-adaptive proce- 
dures are used. Randomization mitigates the proba- 
bility of selection bias and accidental bias, and pro- 
vides a basis for inference (e.g., Rosenberger and 
Lachin, 2002). Taves's original paper did not advo- 
cate randomization, and, in fact, he still supports 
the view that randomization is unnecessary, writing 
in a letter to the editor (Taves, 2004, page 180): 

I hope that the day is not too far dis- 
tant when we look back on the current 
belief that randomization is essential to 
good clinical trial design and realize that 
it was. . . "credulous idolatry." 

Other authors have argued for using minimization 
without the additional component of randomization. 
Aickin (2001) argued that randomization is not needed 
in covariate-adaptive procedures because the covari- 
ates themselves are random, leading to randomness 
in the treatment assignments. He also argued that 
the usual selection bias argument for randomization 
is irrelevant in double-masked clinical trials with a 
central randomization unit. 

Several authors, such as Zelen (1974), Nordle and 
Brandmark (1977), Efron (1980), Signorini et al. 
(1993) and Heritier, Gebski and Pillai (2005), pro- 
posed covariate-adaptive randomization procedures 
which achieve balanced allocation both within mar- 
gins of the chosen factors and within strata. These 
methods emphasize the importance of balancing over 
interactions between factors when such exist. 
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Raghavarao (1980) proposed an allocation proce- 
dure based on distance functions. When the new 
patient enters the trial, one computes dk, the Maha- 
lanobis distance between the covariate profile of the 
patient and the average of the patients already as- 
signed to treatment k, where k = 1, . . . , K. Then the 
patient is assigned to treatment k with probability 
Pk oc d k . 

3. RANDOMIZATION-BASED INFERENCE 

One of the benefits of randomization is that it pro- 
vides a basis for inference (see Chapter 7 of Rosen- 
berger and Lachin, 2002). Despite this, assessment 
of treatment effects in clinical trials is often con- 
ducted using standard likelihood-based methods that 
ignore the randomization procedure used. Letting 
Y*™) = (Yi, . . . ,Y n ) be the response vector, Tn = 
(T\,...,T n ) the treatment assignment vector and 
Jp 1 ) = (Zi, . . . , Z n ) the covariate vectors of patients 
1, ... ,7i, the likelihood can simply be written as 

= £(y n |Y( n - 1 ),T( n ),Z";^) 

■ r^rY^-^T^-^z^fl) 

• C(Z n \ Y^- 1 ), T^" 1 ), Z^-^Cn-L 

As C{Y n \Y^- l \T^),Z n -e) = C{Y n \T n ,Z n -9), the 
treatment assignments do not depend on 9, and the 
covariates are considered i.i.d., we can reduce this 
to the recursion 

C n oc C(Y n \T n , Z n , (f)C n ^\ 

n 

= \\C{Y l \T i ,Z l ;9). 

i=l 

This is the standard regression equation under a 
population model; that is, the randomization is an- 
cillary to the likelihood. Thus, a proponent of the 
likelihood principle would ignore the design in the 
analysis, and proceed with tests standardly available 
in SAS. 

The alternative approach is to use a randomiza- 
tion test, which is a simple nonpar ametric alter- 
native. Under the null hypothesis of no treatment 
effect, the responses should be a deterministic se- 
quence unaffected by the treatment assigned. There- 
fore, the distribution of the test statistic under the 
null hypothesis is computed with reference to all 
possible sequences of treatment assignments under 
the randomization procedure. 



Various authors have struggled with the appro- 
priate way to perform randomization tests follow- 
ing covariate-adaptive randomization. Pocock and 
Simon (1975) initially suggested that the sequence 
of covariate values and responses be treated as deter- 
ministic, and the sequence of treatment assignments 
be permuted for those specific covariate values. This 
is the approach taken by most authors. Ebbutt et al. 
(1997) presented an example where results differed 
when the randomization test took into consideration 
the sequencing of patient arrivals. Senn concluded 
from this that the disease was changing in some way 
through the course of the trial and thus there was a 
time trend present (see the discussion of Atkinson, 
1999). 

4. WHAT WE KNOW ABOUT COVARIATE- 
ADAPTIVE RANDOMIZATION PROCEDURES 

Our knowledge of covariate-adaptive randomiza- 
tion comes from (a) the original source papers; (b) 
a vast number of simulation papers; (c) advocacy or 
regulatory papers (for or against); and (d) review 
papers. Very little theoretical work has been done 
in this area, despite the proliferation of papers. The 
original source papers are fairly uninformative about 
theoretical properties of the procedures. In Pocock 
and Simon (1975), for instance, there is a small dis- 
cussion, not supported by theory, on the appropri- 
ate selection of biasing probability p. There is no 
discussion about the effect of the choice of weights 
for the covariates; no discussion about the effect on 
inference; no theoretical justification that the pro- 
cedure even works as intended: Do covariate imbal- 
ances (loosely defined) tend to zero? Does marginal 
balance imply balance within strata or overall? Wei 
(1978) devotes less than one page to a description of 
his procedure; he does prove that marginal balance 
implies balance within strata for a linear model with 
no interactions. Taves (1974) is a nontechnical pa- 
per with only intuitive justification of the method. 
Simulation papers have been contradictory. 

Klotz (1978) formalized the idea of finding an op- 
timal value of biasing probability p as a constrained 
maximization problem. Consider a trial with K treat- 
ments and covariates. When patient n + 1 is ready 
to be randomized, one computes D k , the measure 
of overall covariate imbalance if the new patient is 
assigned to treatment k = 1, . . . , K. The goal is to 
find the vector of randomization probabilities p = 
(pi, . . . , Pk) which maximizes the entropy measure 
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subject to the constraint on the expected imbal- 
ance. Titterington (1983) built upon Klotz's idea 
and considered minimization of the quadratic dis- 
tance between p and the vector of uniform probabil- 
ities p = (1/K, . . . , 1/K) subject to the constraints 
on the expected imbalance. 

Aickin (2001) provides perhaps one of the few the- 
oretical analyses of covariate-adaptive randomiza- 
tion procedures. He gives a very short proof contra- 
dicting some authors' claims that covariate-adaptive 
randomization can promote imbalances in unmea- 
sured covariates. If X2 is an unmeasured covari- 
ate, and covariate-adaptive randomization was used 
to balance on covariate X±, then X2 can be de- 
composed into its linear regression part, given by 
L(X2\Xi), and its linear regression residual X2 — 
L(X2\Xi). If X\ and X2 are correlated positively or 
negatively, balancing on X\ will improve the balance 
of L{X2\X\). Since the residual is not correlated 
with the randomization procedure, X2 — L(X2\X{) 
will balance as well as with restricted or complete 
randomization. This is a formal justification of the 
intuitive argument that Taves (1974) gave in his 
original paper, an argument that Aickin (2001) says 
is a "remarkably insightful observation." Aickin also 
uses causal inference modeling to show that, if the 
unobserved errors correlated with the treatment as- 
signments and known covariates are linearly related 
to the known covariates, the treatment effect should 
be unbiased. 

There seems to be a troubling misconception in 
the literature with regard to covariate-adaptive ran- 
domization. For example, in an editorial in the British 
Medical Journal (Treasure and MacRae, 1998) we 
have the statement: 

The theoretical validity of the method of 
minimisation was shown by Smith 

The quotation refers to Smith (1984b), which actu- 
ally derives the asymptotic distribution of the ran- 
domization test following a model-based optimal de- 
sign approach favored by many authors. We shall 
discuss this approach momentarily, but it is impor- 
tant to point out that there is no justification, the- 
oretical or otherwise, of minimization methods in 
Smith's paper. 

In contrast to the dearth of publications exploring 
covariate-adaptive randomization from a theoretical 
perspective, a literature search revealed about 30 
papers reporting results of simulation studies. Some 
of these papers themselves are principally a review 



of various other simulation papers. A glance at the 
recent Society for Clinical Trials annual meeting ab- 
stract guide revealed about 10 contributed talks re- 
porting additional simulation results and their use 
in clinical trials, indicating the continuing popular- 
ity of these designs. 

Papers dealing with the comparison of stratified 
block designs with covariate-adaptive randomization 
methods with respect to achieving balance on co- 
variates include the original paper of Pocock and Si- 
mon (1975), Therneau (1993), and review papers by 
Kalish and Begg (1985) and Scott et al. (2002). The 
general consensus is that covariate-adaptive random- 
ization does improve balance for large numbers of 
covariates. 

Inference following covariate-adaptive randomiza- 
tion has been explored by simulation in Birkett (1985), 
using the i-test, Kalish and Begg (1987) using ran- 
domization tests, and Frane (1998), using analysis 
of covariance. Recent papers by Tu, Shalay and Pa- 
ter (2000) and McEntegart (2003) cover a wide- 
ranging number of questions. Tu et al. found that 
minimization method is inferior to stratification in 
reducing error rates, and argued that marginal bal- 
ance is insufficient in the presence of interactions. 
McEntegart concluded that there is little difference 
in power between minimization method and strati- 
fication. Hammerstrom (2003) performed some sim- 
ulations and found that covariate-adaptive random- 
ization does not significantly improve error rates, 
but does little harm, and therefore is useful only for 
cosmetic purposes. 

We conclude this section by interjecting some rele- 
vant questions. Does marginal balance improve power 
and efficiency, or is it simply cosmetic? Is covariate- 
adaptive randomization the proper approach to this 
problem? 

5. MODEL-BASED OPTIMAL 
DESIGN APPROACHES 

An alternate approach to balance is to find the 
optimal design that minimizes the variance of the 
treatment effect in the presence of covariates. This 
approach is first found in Harville (1974), not in the 
context of clinical trials, and in Begg and Iglewicz 
(1980). The resulting designs are deterministic. 

Atkinson (1982) adopted the approach and has 
advocated it in a series of papers, and in the 1982 pa- 
per, introduced randomization into the solution. In 
order to keep consistency with the original paper, we 
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summarize Atkinson's approach for a general case of 
K > 2 treatments. Suppose K treatments are to be 
compared, and responses follow the classical linear 
regression model given by 

E(Y i ) = -x' i /3, i = l,...,n, 

where the Y^'s are independent with Var(Y) = a 2 I 
and Xj is (K + q) x 1 vector which includes treat- 
ment indicators and selected covariates of interest 
(q is the number of covariates in the model). Let (3 
be the least squares estimator of f3. Then Var(/3) = 
cj 2 (X / X)~ 1 , where X'X is the dispersion matrix from 
n observations. 

For the construction of optimal designs we wish 
to find the n points of experimentation at which 
some function is optimized (in our case we will be 
finding the optimal sequence of n treatment assign- 
ments). The dispersion matrix evaluated at these n 
points is given by M(£ n ,) = X'X/ra, where £ n , is the 
n-point design. It is convenient, instead of thinking 
of n points, to formulate the problem in terms of a 
measure £ (which in this case is a frequency distri- 
bution) over a design region S = {1, . . . , K}. 

Atkinson formulated the optimal design problem 
as a design that minimizes, in some sense, the vari- 
ance of A'f3, where A is a matrix of contrasts. One 
possible criterion is Sibson's (1974) D^-optimality 
that maximizes 

(1) lA'M-^OA] -1 . 

For any multivariable optimization problem, we 
compute the directional derivative of the criterion. 
In the case of the D A criterion in (1), we can derive 
the Frechet derivative as 

d A (x, = x'M" 1 (£) A( AM" 1 (0 A)" 1 AM" 1 (£)x, 

for x £ 3. By the classical Equivalence theorem of 
Kiefer and Wolfowitz (1960), the optimal design £* 
that maximizes the criterion (1) then satisfies the 
following equations: 

supdA(x,£)<s V(gH 

and 

supdyi(x,^*) = s. 

Such a design is optimal for estimating linear con- 
trasts of (3. Assume n patients have already been al- 
located, and the resulting n-point design is given by 
£ n . Let the value of d A (x,^) for allocation of treat- 
ment k be <i/i(fc,£). Atkinson proposed a sequential 



design which allocates the (n + l)th patient to the 
treatment k = 1, . . . , K for which d A (k, £n) is a max- 
imum, given the patient's covariates. The resulting 
design is deterministic. 

In order to randomize the allocation, Atkinson 
suggested biasing a coin with probabilities 

(2) Pk = t^JhMl 

where ip(x) is any monotone increasing function, 
and allocating to treatment k with the correspond- 
ing probability. With two treatments, k = 1,2, we 
have s = 1, A' = (— 1, 1, 0, . . . , 0), and the probabil- 
ity of assigning treatment 1 is given by 

(We consider only the case of two treatments in this 
paper.) Equation (3) gives a broad class of covariate- 
adaptive randomization procedures. The choice of 
function ip has not been explored adequately. Atkin- 
son (1982) suggested using i/)(x) = x; Ball, Smith 
and Verdinelli (1993) suggested ^{x) = (1 + x) 1 / 7 
for a parameter 7 > 0, which is a compromise be- 
tween randomness and efficiency. 

Atkinson (1999, 2002) performed careful simula- 
tion studies to compare the performance of several 
covariate-adaptive randomization procedures for a 
linear model with constant variance and trials up 
to n = 200 patients. One criterion of interest was 
loss, the expected amount of information lost due 
to treatment and covariate imbalance. Another cri- 
terion was selection bias, measuring the probability 
of correctly guessing the next treatment assignment. 
Atkinson observed that the deterministic procedure 
based on the D^-optimality criterion has the small- 
est value of loss, and Atkinson's randomized proce- 
dure (3) with ip(x) = x increases the loss. He noted 
that D^-optimal designs are insensitive to correla- 
tion between the covariates, while complete random- 
ization and minimization method increase the loss 
when covariates are correlated. 

6. WHAT WE KNOW ABOUT ATKINSON'S 
CLASS OF PROCEDURES 

Considerably more theoretical work has been done 
on the class of procedures in (3) than for the covariate- 
adaptive randomization procedures in Section 2. Most 
of the work has been done in a classic paper by 
Smith (1984a), although he dealt with a variant on 
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the procedure in (3). It is instructive to convert to 
his notation: 

1 

E(Y n ) = at n + } j Z n jPj, 

where Y n and t n are the response and treatment 
assignments of the nth patient, respectively, and 
z n j represent q covariates, and may include an in- 
tercept. Let T n be the treatment assignment vec- 
tor and let Z n be the matrix of covariates. Then 
Atkinson's procedure in (3) can be formulated as 
follows: assign t n +x = ±1 with probabilities propor- 
tional to (±l- Z ; +1 (Z n Z n )- 1 Z n t„) 2 (Smith, 1984b, 
page 543). Smith (1984a) introduced a more general 
class of allocation procedures given by 



(4) 



n+1 = tp(n 1 z' n+1 Q L Z' n t n ) 



I'll 



where ip is nonincreasing, twice continuously differ- 
entiable function with bounded second derivative 
satisfying ip(x) + ip(—x) = 1, and Q = E(z n z' n ) = 
lim n _ s . 00 n (Z^Z n ). It is presumed that the {z n } 
are independent, identically distributed random vec- 
tors, Q is nonsingular and all third moments of z n 
are finite. Note that the procedure (4) can be im- 
plemented only if the distribution of covariates is 
known in the beginning of the trial. 

Smith suggested various forms of tp, most leading 
to a proportional biased coin raised to some power p. 
In general, p = — 2t//(0). Without covariates, Atkin- 
son's procedure in (2) leads to 



Pi P ' 



where p = 2. Smith found the asymptotic variance 
of the randomization test based on the simple treat- 
ment effect, conditional on Z n . He did not do any 
further analysis or draw conclusions except to sug- 
gest that p should be selected by the investigator 
to be as large as possible to balance the competing 
goals of balance, accidental bias and selection bias. 

7. BALANCE, EFFICIENCY OR ETHICS? 

Clinical trials have multiple objectives. The prin- 
cipal considerations are given in the schematic in 
Figure 1. Balance across treatment groups is often 
considered essential both for important covariates 
and for treatment numbers themselves. Efficiency 
is critical for demonstrating efficacy. Randomization 
mitigates certain biases. Ethics is an essential com- 
ponent in any human experimentation, and dictates 



our treatment of patients in the trial. These consid- 
erations are sometimes compatible, and sometimes 
in conflict. In this section, we describe the interplay 
among balance, efficiency and ethics in the context 
of randomized clinical trials, and give some exam- 
ples where they are in conflict. 

In a normal error linear model with constant vari- 
ance, numerical balance between treatments on the 
margins of the covariates is equivalent to minimizing 
the variance of the treatment effect. This is not true 
for nonlinear models, such as logistic regression or 
traditional models for survival analysis (Begg and 
Kalish, 1984; Kalish and Harrington, 1988). As we 
shall discuss further in the next section, balance does 
not imply efficiency except in specialized cases. This 
leaves open the question, is balance on covariates 
important? 

We have the conflict recorded in a fascinating in- 
terchange among Atkinson, Stephen Senn and John 
Whitehead (Atkinson, 1999). Whitehead argues: 

I think that one criterion is really to re- 
duce the probability of some large imbal- 
ance rather than the variance of the esti- 
mates. . . . And to make sure that these un- 
convincing trials, because of the large im- 
balance, happen with very low probabil- 
ity, perhaps is more important. ... I would 
always be wanting to adjust for these vari- 
ables. None the less, the message is sim- 
pler if my preferred adjusted analysis is 
similar to the simple message of the clin- 
icians. 

Senn gives the counterargument: 

I think we should avoid pandering to these 
foibles of physicians I think people worry 



Randomization 



Balance 



Phase If I 
clinical Irial 



Ethics 



Equivalent 



\ 



Marnoscedaslic 
linear models 



May or may not conflict 



Hetero&eedastic and 
nonlinear models 

_L 



Efficiency 



Fig. 1. Multiple objectives of a phase III clinical trial. 
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far too much about imbalance from the in- 
ferential (sic) point of view The way I 

usually describe it to physicians is as fol- 
lows: if we have an unbalanced trial, you 
can usually show them that by throwing 
away some patients you can reduce it to a 
perfectly balanced trial. So you can actu- 
ally show that within it there is a perfectly 
balanced trial. You can then say to them: 
'now, are you prepared to make an infer- 
ence on this balanced subset within the 
trial?' and they nearly always say 'yes.' 
And then I say to them, 'well how can a 
little bit more information be worse than 
having just this balance trial within it?' 

We thus encounter once again deep philosophical 
differences and the ingrained culture of clinical tri- 
alists. Fortunately, balance and efficiency are equiv- 
alent in homoscedastic linear models. Thus, strati- 
fied randomization and covariate-adaptive random- 
ization procedures (such as Pocock and Simon's 
method) are valid to the degree in which they force 
balance over covariates. Atkinson's model-based ap- 
proach is an alternative method that can incorpo- 
rate treatment-by-covariate interactions and contin- 
uous covariates. Atkinson's class of procedures for 
linear models has an advantage of being based on 
formal optimality criteria as opposed to ad hoc mea- 
sures of imbalance used in covariate-adaptive ran- 
domization procedures. On the other hand, balanced 
designs may not be most efficient in the case of non- 
linear and heteroscedastic models. We agree with 
Senn that cosmetic balance, while psychologically 
reassuring, should not be the goal if power or effi- 
ciency is lost in the process of forcing balance. 

First, let us illustrate that balanced allocation can 
be less efficient and less ethically appealing than 
unbalanced allocation in some instances, and that 
there may exist unbalanced designs which outper- 
form balanced designs in terms of compound ob- 
jectives of efficiency and ethics. Consider a binary 
response trial of size n comparing two treatments A 
and B, and suppose there is an important binary 
covariate Z, say gender (Z = if a patient is male, 
and Z = 1 if female), such that there are uq males 
and n\ females in the trial. Also assume that success 
probabilities for treatment k are p^o for males and 
Pki for females, where k = A,B. Let qkj = 1 — Pkj, 
j = 0,1. For the time being we will assume that the 
true success probabilities are known. One measure 



of the treatment effect for binary responses is the 
log-odds ratio, which can be expressed as 



(5) logOR(Z = i)=log 



PAj/qAj 
PBj/qBf 



J = 0, 1. 



An experimental design question is to determine 
allocation proportions iTAj an d ttbj m stratum j 
for treatments A and B, respectively, where j = 
(male) or j = 1 (female). Let us consider the follow- 
ing three allocation rules: 

Rule 1: Balanced treatment assignments in the two 
strata, given by 

7TAj = TT B j = 1/2, j = 0,l; 

Rule 2: Neyman allocation maximizing the power 
of the stratified asymptotic test of the log-odds ra- 
tio: 



logOR(Z = j) 



j = 0, 1. 



/ w(logOR(Z = j)) 

The allocation proportion is given by 

l/VPAjQAj 
1/ \/PAjQAj + 1/y/PBjqBj ' 



J = 0,1; 



Rule 3: the analog of Rosenberger et al.'s (2001) 
optimal allocation minimizing the expected number 
of treatment failures in the trial subject to the fixed 
variance of the log-odds ratio. This is given by 



Ai 



l/JpAjq 2 Ai 



^l\PAjq% + l /\PBjq B 



i = o,i. 



Note that unlike Rule 1, Rules 2 and 3 depend 
on success probabilities in the two strata, and are 
unbalanced, in general. Consider a case when no = 
ni = 100 and let {pao,Pbo) = (0.95,0.7) and (j>ai, 
Pbi) = (0.7,0.95). This represents a case when one 
of the treatments is highly successful, there is sig- 
nificant treatment difference between A and B, and 
there is treatment-by-covariate interaction (treat- 
ment A is more successful for males and is less suc- 
cessful for females). Then allocation proportions for 
treatment A in the two strata are ttao = 0.68 and 
itAl = 0.32 for Rule 2, and ttao = 0.84 and ttai = 
0.16 for Rule 3. 

All three rules are very similar in terms of ef- 
ficiency, as measured by the asymptotic variances 
of stratum-specific estimates of the log-odds ratio. 
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However, Rules 2 and 3 provide extra ethical sav- 
ings. For the sample size considered, Rule 3 is ex- 
pected to have 16 fewer failures than the balanced 
design. At the same time, Rule 2, whose primary 
purpose is optimizing efficiency, is expected to have 
8 fewer failures than the balanced allocation. There- 
fore, in addition to maximizing efficiency, Rule 2 
provides additional ethical savings, and is certainly 
far more attractive than balanced allocation. 

So far we have compared different target alloca- 
tions for "fixed" designs, that is, for a given num- 
ber of patients in each treatment group and known 
model parameters. In practice, true success proba- 
bilities are not available at the trial onset, which 
precludes direct implementation of Rules 2 and 3. 
Since clinical trials are sequential in nature, one 
can use accruing responses to estimate the parame- 
ters, and then cast a randomization procedure which 
asymptotically achieves the desired allocation. To 
study operating characteristics of response-adaptive 
randomization procedures targeting Neyman alloca- 
tion (Rule 2) and optimal allocation (Rule 3) we 
ran a simulation study in R using 10,000 replica- 
tions (results are available from the second author 
upon request). In the simulations we assumed that 
two strata (male and female) are equally likely. For 
Rules 2 and 3, the doubly adaptive biased coin de- 
sign (DBCD) procedure of Hu and Zhang (2004) was 
used within each stratum to sequentially allocate pa- 
tients to treatment groups. In addition, balanced al- 
location was implemented using stratified permuted 
block design (PBD) with block size m = 8. We as- 
sumed that responses are immediate, and compared 
the procedures with respect to power of the strat- 
ified asymptotic test of the log-odds ratio for test- 
ing the null hypothesis Hq: (pao = Pbo) and (pai = 
Pbi) versus Ha- not Hq using significance level a = 
0.05, and the expected number of treatment fail- 
ures. We considered several experimental scenarios 
for success probabilities (j>Aj,PBj), j = 0, 1, includ- 
ing the one described in the example above. To fa- 
cilitate comparisons, the sample size for each ex- 
perimental scenario was chosen such that the strati- 
fied block design achieves approximately 80% power 
of the test. In summary, response- adaptive random- 
ization procedures worked as expected: for chosen 
sample sizes they converged to the targeted alloca- 
tions and preserved the nominal significance level. 
Additionally, response-adaptive randomization pro- 
cedures had similar average power to the PBD, but 



on average they had fewer treatment failures. Ethi- 
cal savings of response-adaptive designs were more 
pronounced when one of the treatments had high 
success probability (0.8-0.9) and treatment differ- 
ences were large. 

We would also like to emphasize that phase III 
trials are pivotal studies, and one typically has an 
idea about the success probabilities of the treat- 
ments from early stage trials. If a particular allo- 
cation is such that it leads to high power of the test, 
and it is also skewed toward the better treatment, 
then it makes sense to implement such a procedure. 
The additional ethical savings can be prominent if 
the ethical costs associated with trial outcomes are 
high, such as deaths of trial participants. 

8. CARA RANDOMIZATION 

Hu and Rosenberger (2006) define a covariate- 
adjusted response-adaptive (CARA) randomization 
procedure as one for which randomization probabil- 
ities for a current patient depend on the history of 
previous patients' treatment assignments, responses 
and covariates, and the covariate vector of the cur- 
rent patient, that is, 

(6) 0, = Pr(T j+1 = l|Tj,Yj,Zi,..., Z jt Z j+1 ). 

There have been only few papers dealing with CARA 
randomization, and it has become an area of active 
research. CARA randomization is an extension of 
response-adaptive randomization which deals with 
adjustment for covariates. Response-adaptive ran- 
domization has a rich history in the literature, and 
the interested reader is referred to Section 1.2 of Hu 
and Rosenberger (2006). 

Bandyopadhyay and Biswas (2001) considered a 
linear regression model for two treatments and co- 
variates with an additive treatment effect and con- 
stant variance. Suppose large values of response cor- 
respond to a higher efficacy. Then the new patient 
is randomized to treatment 1 with probability 

(7) <j> j+1 = $(d j /T), 

where dj is the difference of covariate- adjusted treat- 
ment means estimated from the first j patients, T 
is a scaling constant and $ is the standard nor- 
mal c.d.f. Although procedure (7) depends on the 
full history from j patients, it does not account for 
covariates of the (j + l)th patient, and it is not a 
CARA procedure in the sense of (6). Also, this pro- 
cedure depends on the choice of T, and small values 
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of T can lead to severe treatment imbalances which 
can lead to high power losses. 

Atkinson and Biswas (2005a, 2005b) improved the 
allocation rule of Bandyopadhyay and Biswas (2001) 
by proposing CARA procedures that are based on a 
weighted D^-optimal criterion combining both effi- 
ciency and ethical considerations. They investigated 
operating characteristics of the proposed designs 
through simulation, but they did not derive asymp- 
totic properties of the estimators and allocation pro- 
portions. Without the asymptotic properties of the 
estimators, it is difficult to assess the validity of sta- 
tistical inferences following CARA designs. 

A few papers describe CARA designs for binary 
response trials. One of the first papers in this field is 
by Rosenberger, Vidyashankar and Agarwal (2001). 
They assumed that responses in treatment group 
k = A,B follow the logistic regression model 

logit(Pr(Y fc = l|Z = z))=0' fc z, 

where 6f~ is a vector of model parameters for treat- 
ment k. Let OjA and Ojb be the maximum likelihood 
estimators of model parameters computed from the 
data from j patients. Then the (j + l)th patient is 
randomized to treatment A with probability 

4>j+i = F ((djA - Oj B )'zj+i), 

where F is the standard logistic c.d.f. Basically, each 
patient is allocated according to the current value of 
covariate-adjusted odds ratio comparing treatments 
A and B. The authors compared their procedure 
with complete randomization through simulations 
assuming delayed responses. They showed that for 
larger treatment effects both procedures have simi- 
lar power, but at the same time the former results in 
a smaller expected proportion of treatment failures. 

Bandyopadhyay, Biswas and Bhattacharya (2007) 
also dealt with binary responses. They proposed a 
two-stage design for the logistic regression model. 
At the first stage, 2m patients are randomized to 
treatment A or B in a 1 : 1 ratio and accumulated 
data are used to estimate model parameters. At the 
second stage, each patient is randomized to treat- 
ment A with a probability which depends on the 
treatment effect estimated from the first stage and 
the current patient's covariate vector. 

Theoretical properties of CARA procedures have 
been developed in a recent paper by Zhang et al. 
(2007). This paper proposed a general framework for 
CARA randomization procedures for a very broad 
class of models, including generalized linear models. 



In the paper the authors proved strong consistency 
and asymptotic normality of both maximum likeli- 
hood estimators and allocation proportions. They 
also examined the CARA design of Rosenberger, 
Vidyashankar and Agarwal (2001) and provided 
asymptotic properties of the procedure. 

CARA procedures do not lend themselves to anal- 
ysis via randomization-based inference. The theo- 
retical validity of randomization tests is based on 
conditioning on the outcome data as a set of suffi- 
cient statistics, and then permuting the treatment 
assignments. Under the null hypothesis of no treat- 
ment difference, the observed outcome data should 
be exchangeable, leading to a valid randomization p- 
value (see Pesarin, 2001). However, under the CARA 
procedure, the treatment assignments and outcomes 
form the sufficient statistics, and conditioning on 
both would leave nothing. One could perform a stan- 
dard permutation test on the resulting data by intro- 
ducing a "sham" equiprobable randomization, but 
one would lose information about treatment efficacy. 

Therefore, we rely on likelihood-based methods 
to conduct inference following a CARA randomiza- 
tion procedure, and Zhang et al. (2007) provide the 
necessary asymptotic theory. For further discussion 
of appropriate inference procedure following general 
response- adaptive randomization procedures, refer 
to Chapter 3 of Hu and Rosenberger (2006) and 
Baldi Antognini and Giovagnoli (2005, 2006). 

9. COMPARING DIFFERENT 
RANDOMIZATION PROCEDURES WHICH 
ACCOUNT FOR COVARIATES 

In the following we used simulation to compare the 
operating characteristics of several covariate-adaptive 
randomization procedures and CARA procedures for 
the logistic regression model. We used the covariate 
structure considered in Rosenberger, Vidyashankar 
and Agarwal (2001). Assume that responses for treat- 
ment k satisfy the following logistic regression model: 

3 

(8) logit(Pr(V fc = l|z)) = a k 

i=i 

where a k is the treatment effect, and /3k j is the ef- 
fect due to the jth covariate in treatment group 
k = A,B. The parameter of interest is the covariate- 
adjusted treatment difference a a — o-b- The 
components of covariate vector z' = (z±, Z2, 23), 
which represent gender, age and cholesterol level, 
were assumed to be independently distributed as 
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Bernoulli(l/2), Discrete Uniform [30, 75] and 
Normal(200,20). Note that model (8) allows for 
treatment-by-covariate interactions, since covariate 
effects /3fc,-'s are not the same across the treatments. 

The operating characteristics of designs included 
measures of balance, efficiency and ethics. For bal- 
ance we considered the allocation proportion 
Na (n)/n, and the allocation proportions within the 
male category of covariate gender, NAo(n)/No(n). 
Also, we examined the Kolmogorov-Smirnov dis- 
tance c?ks(-22) between empirical distributions of co- 
variate age in treatment groups A and B. The effi- 
ciency of procedures was measured by the average 
power of the asymptotic test of the log-odds ratio 
evaluated at a given zq. The ethical aspect of a pro- 
cedure was assessed by the total number of treat- 
ment failures, F(n). 

The sample size n was chosen in such a way that 
complete randomization yields approximately 80% 
or 90% power of the test of log-odds ratio under 
a particular alternative. For each choice of n we 
also estimated the significance level of the test un- 
der the null hypotheses. We report the results for 
three sets of parameter values given in Table 1 . Un- 
der the null hypothesis of no treatment difference 
(Model 1), n = 200. When a A -a B = -1 (Model 2), 
the choice of n = 200 yields 80% power for complete 
randomization. When a a — cub = —1-25 (Model 3), 
we let n = 160, which corresponds to 90% power for 
complete randomization. 

The first class of procedures are CARA designs. 
For their implementation, we need to sequentially 
estimate model parameters. In our simulations we 
assumed that all responses are immediate after ran- 
domization, although we can add a queuing struc- 
ture to explore the effects of delayed response. For 
CARA procedures, some data must accumulate so 
that the logistic model is estimable. We used Pocock 
and Simon's method to allocate the first 2mo pa- 
tients to treatments A and B. 



Table 1 

Parameter values for the logistic regression model (8) used 
in simulations 



Parameters 






Model 






1 




2 


3 




A 


B 


A B 


A 


B 


Oik 


-1.652 - 


-1.652 


-1.402 -0.402 


-1.652 - 


-0.402 


Phi 


-0.810 - 


-0.810 


-0.810 0.173 


-0.810 


0.173 




0.038 


0.038 


0.038 0.015 


0.038 


0.015 


Pk3 


0.001 


0.001 


0.001 0.004 


0.001 


0.004 



Suppose after n > 2mo allocations the m.l.e. of 
9 k has been computed as 9 nk . Then, for a sequen- 
tial m.l.e. CARA procedure, the (n + l)th patient 
with covariate z n+ i is allocated to treatment A with 
probability 4> n+1 = p(6 n>A , 9 n ,B, z n+ i). We explored 
four different choices of p: 

1. Rosenberger, Vidyashankar and Agarwal's (2001) 
target: 

= Pa(z)A?a(z) 

p A {z)/q A {z) +p B {z)/qB{z) ' 

2. Covariate-adjusted version of Rosenberger et al.'s 
(2001) allocation: 

= Vpa(z) 

3. Covariate-adjusted version of Neyman allocation: 

v /p jB (z)g jB (z) 

P3 = — 

Vpb( z )qb(z) + y/p A (z)qA(z) 

4. Covariate-adjusted version of optimal allocation: 

y / p jB (z)g B (z) 

y/p B (z)q B (z) + v / p j4 (z) I ?a(z) 

Here Pfc(z) = 1/(1 + exp(— 0' k z)) and qk(z) = 1 - 
Pfc(z), k = A,B. We will refer to CARA procedures 
with four described targets as CARA 1, CARA 2, 
CARA 3 and CARA 4, respectively. 

We also considered an analogue of Akinson and 
Biswas's (2005a) procedure for the binary response 
case. It is worthwhile to describe this approach in 
more detail. Consider model (8) and let 9 k = (a k , 
Plkt $2ki Pzk)' ■ Suppose that a trial has ua patients 
allocated to treatment A and rig = n — ua patients 
allocated to treatment B. Then the information ma- 
trix about 6 = (9 a, Ob) based on n observations is 
of the form 

M n = diag^WAZ^, Z' B W B Z B }, 

where is the n^xp matrix of covariates for treat- 
ment k, Wfc is x rifc diagonal matrix with ele- 
ments PkQk- Here p k = Pk(zi,9k) denote the success 
probability on treatment k given Zj and q k = 1 — Pk , 
k = A,B. Suppose the (n + l)th patient enters the 
trial. Then the directional derivative of the criterion 
det(M) for treatment k given z n+ i is computed as 

(9) d(k,9 n ,z n+1 ) = z' n+l (Z' k W k Z k y 1 'L n+l p k q k . 
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Note that (9) depends on 6^, which must be esti- 
mated using the m.l.e. n k- The (re + l)th patient 
is randomized to treatment A with probability 

(io) 4> n+l = — g — g -, 

where fj~ is the desired proportion on treatment k. 
We take fk=Pk(^)/Qk( z )- The CARA procedure 
(10) will be referred to as CARA 5. 

The second class of allocation rules are covariate- 
adaptive randomization procedures. For Pocock and 
Simon's (P-S) procedure, each component of z n+ i is 
discretized into two levels, and the sum of marginal 
imbalances within these levels is computed. The (re + 
l)th patient is allocated with probability 3/4 to the 
treatment which would minimize total covariate im- 
balance. If imbalances for treatments A and B are 
equal, then the patient is assigned to either treat- 
ment with probability 1/2. 

For the stratified permuted block design (SPBD), 
the stratum of the current patient is determined 
based on the observed combination of the patient's 
covariate profile. Within that stratum allocations 
are made using permuted blocks of size m = 10. It 
is possible that had some unfilled last blocks, and 
thus perfect balance is not achieved. However, we 
did not specifically examine this feature of SPBD. 
We also report the results for complete randomiza- 
tion (CRD). 

The program performing the simulations was writ- 
ten in R. For each procedure, a trial with n patients 
was simulated 5000 times. To facilitate the compari- 
son of the procedures, the re x 4 matrix of covariates 
Z was generated once and was held fixed for all sim- 
ulations. For CARA procedures, the first 2mo = 80 
patients were randomized by Pocock and Simon's 
procedure with biasing probability p = 3/4. The re- 
sponse probabilities of patients in treatment group 
k = A,B were computed by multiplying the rows of 
Z by the vector of model parameters and calculat- 
ing the logistic c.d.f. F(x) = 1/(1 + exp(— x)) at the 
computed values. The significance level of the test 
was set a = 0.05, two-sided. 

Table 2 shows the results under the null hypoth- 
esis (Model 1). We see that all rules produce bal- 
anced allocations. CARA 1, CARA 3 and CARA 4 
procedures are slightly anticonservative, with a type 
I error rate of 0.06, while the procedures CARA 2 
and CARA 5 preserve the nominal significance level 
of 0.05. Pocock and Simon's procedure is the least 



variable among the eight rules considered; the other 
procedures are almost identical in terms of variabil- 
ity of allocation proportions. 

Tables 3 and 4 show the results for Models 2 and 3, 
respectively. The conclusions are similar in the two 
cases, and so we will focus on Model 2. Balanced de- 
signs equalize the treatment assignments very well. 
As expected, the stratified blocks and Pocock and 
Simon's procedure are less variable than complete 
randomization. Similar conclusions about balancing 
properties of the designs apply to balancing with 
respect to the continuous covariates. The average 
power is 90% for the stratified blocks and Pocock 
and Simon's procedure, and 89% for complete ran- 
domization. 

Let us now examine the performance of CARA 
procedures. All CARA procedures are more variable 
than the stratified blocks and Pocock and Simon's 
method, but a little less variable than complete ran- 
domization. In addition, all CARA procedures do a 
good job in terms of balancing the distributions of 
the continuous covariates [estimated ^ks^) = 0-13 
(S.D. = 0.04) versus 0.14 (S.D. = 0.04) for complete 
randomization]. CARA 2, CARA 3 and CARA 5 
procedures are closest to the balanced design. The 
simulated allocation proportions for treatment A 
and the corresponding standard deviations are 0.48 
(0.03) for CARA 2, and 0.48 (0.03) for CARA 3, 
and 0.47 (0.03) for CARA 5 procedure. These three 
CARA procedures have average power of 81%, same 
as for stratified blocks and Pocock and Simon's pro- 
cedure, but at the same time they yield two fewer 
failures than the balanced designs. CARA 4 proce- 
dure has the power of 80% (same as for complete 
randomization), but it has, on average, four fewer 
failures than the balanced designs. CARA 1 proce- 
dure is the most skewed: the simulated allocation 
proportion for treatment A and the standard devia- 
tion is 0.40 (0.04), and it results, on average, in six 
fewer treatment failures than in the balanced design 
case. On the other hand, it is less powerful than 
balanced designs (the average power is 76%). 

The overall conclusion is that CARA procedures 
may be a good alternative to covariate-adaptive pro- 
cedures targeting balanced allocations in the nonlin- 
ear response case. Although incorporating responses 
in randomization induces additional variability of al- 
location proportions, which may potentially reduce 
power, one can see from our simulations that such 
an impact is not dramatic. 
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Table 2 

Simulation results for Model 1 with 8 a = Ob and n = 200 



Procedure 




( S - D -) 


d KS (z 2 ) (S.D.) 


Err. rate 


F(n) (S.D.) 


CRD 


0.50 (0.03) 


0.50 (0.05) 


0.12 (0.04) 


0.05 


90 (6) 


SPBD 


0.50 (0.03) 


0.50 (0.04) 


0.12 (0.03) 


0.05 


90 (6) 


P-S 


0.50 (0.00) 


0.50 (0.01) 


0.10 (0.03) 


0.05 


90 (6) 


CARA 1 


0.50 (0.03) 


0.50 (0.04) 


0.11 (0.03) 


0.06 


90 (6) 


CARA 2 


0.50 (0.03) 


0.50 (0.04) 


0.12 (0.03) 


0.05 


90 (6) 


CARA 3 


0.50 (0.02) 


0.50 (0.04) 


0.11 (0.03) 


0.06 


90 (6) 


CARA 4 


0.50 (0.02) 


0.50 (0.04) 


0.12 (0.03) 


0.06 


90 (6) 


CARA 5 


0.50 (0.02) 


0.50 (0.04) 


0.12 (0.04) 


0.05 


90 (6) 



For CARA procedures, it is essential that the first 
allocations to treatment groups are made by us- 
ing some covariate-adaptive procedure or the strat- 
ified block design, so that some data accrue and 
one can estimate the unknown model parameters 
with reasonable accuracy. From numerical experi- 
ments we have found that at least 80 patients must 
be randomized to treatment groups before m.l.e.'s 
can be computed. Alternatively, one can check af- 
ter each allocation the convergence of the iteratively 
reweighted least squares algorithm for fitting the 



logistic model, as Rosenberger, Vidyashankar and 
Agarwal (2001) did. However, due to the slow con- 
vergence of m.l.e.'s, we have found that it is bet- 
ter, first, to achieve reasonable quality estimators by 
using a covariate-adaptive randomization procedure 
with good balancing properties (such as Pocock and 
Simon's method). 

From our simulations one can see that there are 
CARA procedures (such as CARA 4 procedure) which 
have the same average power as complete random- 
ization, but at the same time they result in three 



Table 3 

Simulation results for Model 2 with a a — olb = —1.0 and n = 200 



Procedure (S.D.) ^ (S.D.) d KS (z 2 ) (S.D.) Power F(n) (S.D.) 



CRD 0.50 (0.04) 0.49 (0.05) 0.12 (0.04) 0.80 62 (6) 

SPBD 0.50 (0.03) 0.50 (0.04) 0.12 (0.03) 0.81 62 (6) 

P-S 0.50 (0.01) 0.50 (0.01) 0.10 (0.03) 0.81 62 (6) 

CARA 1 0.40 (0.04) 0.45 (0.04) 0.12 (0.03) 0.76 56 (6) 

CARA 2 0.48 (0.03) 0.49 (0.04) 0.12 (0.03) 0.81 60 (6) 

CARA 3 0.48 (0.03) 0.49 (0.04) 0.12 (0.03) 0.81 60 (6) 

CARA 4 0.45 (0.03) 0.48 (0.04) 0.12 (0.03) 0.80 58 (6) 

CARA 5 0.47 (0.03) 0.50 (0.04) 0.12 (0.04) 0.81 60 (6) 



Table 4 

Simulation results for Model 3 with a a — olb — —1.25 and n — 160 



Procedure 


NaM (s.d.) 


W$ (S-D.) 


d KS (z 2 ) (S.D.) 


Power 


F(n) (S.D.) 


CRD 


0.50 (0.04) 


0.49 (0.05) 


0.14 (0.04) 


0.89 


54 (6) 


SPBD 


0.50 (0.01) 


0.50 (0.01) 


0.12 (0.03) 


0.89 


54 (6) 


P-S 


0.50 (0.01) 


0.50 (0.01) 


0.11 (0.03) 


0.90 


54 (6) 


CARA 1 


0.39 (0.04) 


0.43 (0.04) 


0.13 (0.04) 


0.86 


50 (6) 


CARA 2 


0.47 (0.03) 


0.48 (0.04) 


0.13 (0.04) 


0.90 


53 (6) 


CARA 3 


0.48 (0.03) 


0.48 (0.04) 


0.13 (0.04) 


0.90 


54 (6) 


CARA 4 


0.44 (0.03) 


0.45 (0.04) 


0.13 (0.04) 


0.89 


51 (6) 


CARA 5 


0.47 (0.02) 


0.50 (0.03) 


0.12 (0.03) 


0.91 


53 (5) 
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to four fewer failures than the balanced allocations. 
Such extra ethical savings together with high power 
for showing treatment efficacy can be a good reason 
for using CARA procedures to design efficient and 
more ethically attractive clinical trials. 

10. DISCUSSION 

The design of clinical trials has become a rote ex- 
ercise, often driven by regulatory constraints. Boiler- 
plate design sections in protocols and grant propos- 
als are routinely presented to steering committees, 
review committees, and data and safety monitoring 
boards. It is not uncommon for the randomization 
section of a protocol to state "double-blinded ran- 
domization will be performed" with no further de- 
tails. The fact that randomization is rarely if ever 
used as a basis for inference means that the partic- 
ular randomization sequence is not relevant in the 
analysis, with the exception that stratified designs 
typically lead to stratified tests. Balance among im- 
portant baseline covariates is seen to be an essential 
cosmetic component of the clinical trial, and many 
statisticians recommend adjusting for imbalanced 
covariates following the trial, even if such analyses 
were not planned in the design phase. While effi- 
ciency is usually gauged by a sample size formula, 
the role that covariates play in efficiency, and the 
idea that imbalances may sometimes lead to better 
efficiency and more patients assigned to the supe- 
rior treatment, are not generally considered in the 
design phase of typical clinical trials. 

In clinical trials with normally distributed out- 
comes, where it is assumed that the variability of 
the outcomes is similar across treatments, a bal- 
anced design across treatments and covariates will 
be the most efficient. In these cases, if there are sev- 
eral important covariates, stratification can be em- 
ployed successfully, and if there are many covariates 
deemed of sufficient importance, covariate-adaptive 
randomization can be used to create balanced, and 
therefore efficient, designs. 

However, as we have seen, these simple ideas break 
down when there are heterogeneous variances, in- 
cluding those found in commonly performed trials 
with binary responses or survival responses. The 
good news is that there are new randomization tech- 
niques that can be incorporated in the design stage 
that can lead to more efficient and more ethically 
attractive clinical trials. These randomization tech- 
niques are based on the optimal design of experi- 
ments and also tend to place more patients on the 



better treatment (Zhang et al., 2007). While more 
work needs to be done on the properties of these 
procedures, we agree with Senn's comments that ef- 
ficiency is much more important than cosmetic bal- 
ance. 

The design of clinical trials is as important as the 
analysis of clinical trials. Ethical considerations and 
efficiency should dictate the randomization proce- 
dure used; careful selection of a good design can save 
time, money, and in some cases patients' lives. As 
Hu and Rosenberger (2006) point out, modern infor- 
mation technology has progressed to the point where 
logistical difficulties of implementing more complex 
randomization procedures are no longer an issue. 
Careful design involves an understanding of both 
the theoretical properties of a design in general, and 
simulated properties under a variety of standard to 
worst-case models. In some cases, the trade-offs in 
patient benefits and efficiency are so modest com- 
pared to the relative gravity of the outcome, that 
standard balanced designs may be acceptable. How- 
ever, when outcomes are grave, and balanced de- 
signs may produce severe inefficiency or too many 
patients assigned to the inferior treatment, careful 
design is essential. 
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